An empirical evaluation of high utility itemset mining algorithms

نویسندگان

  • Chongsheng Zhang
  • George Almpanidis
  • Wanwan Wang
  • Changchang Liu
چکیده

High utility itemset mining (HUIM) has emerged as an important research topic in data mining, with applications to retail-market data analysis, stock market prediction, and recommender systems, etc. However, there are very few empirical studies that systematically compare the performance of state-of-the-art HUIM algorithms. In this paper, we present an experimental evaluation on 10 major HUIM algorithms, using 9 real world and 27 synthetic datasets to evaluate their performance. Our experiments show that EFIM and d2HUP are generally the top two performers in running time, while EFIM also consumes the least memory in most cases. In order to compare these two algorithms in depth, we use another 45 synthetic datasets with varying parameters so as to study the influence of the related parameters, in particular the number of transactions, the number of distinct items and average transaction length, on the running time and memory consumption of EFIM and d2HUP. In this work, we demonstrate that, d2HUP is more efficient than EFIM under low minimum utility values and with large sparse datasets, in terms of running time; although EFIM is the fastest in dense real datasets, it is among the slowest algorithms in sparse datasets. We suggest that, when a dataset is very sparse or the average transaction length is large, and running time is favoured over memory consumption, d2HUP should be chosen. Finally, we compare d2HUP and EFIM with two newest algorithms, mHUIMiner and ULB-Miner, and find these two algorithms have moderate performance. This work has reference value for researchers and practitioners when choosing the most appropriate HUIM algorithm for their specific applications. © 2018 Elsevier Ltd. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Algorithm for High Average-utility Itemset Mining

High utility itemset mining (HUIM) is a new emerging field in data mining which has gained growing interest due to its various applications. The goal of this problem is to discover all itemsets whose utility exceeds minimum threshold. The basic HUIM problem does not consider length of itemsets in its utility measurement and utility values tend to become higher for itemsets containing more items...

متن کامل

Mining High Utility Itemsets – A Recent Survey

Association rule mining (ARM) plays a vital role in data mining. It aims at searching for interesting pattern among items in a dense data set or database and discovers association rules among the large number of itemsets. The importance of ARM is increasing with the demand of finding frequent patterns from large data sources. Researchers developed a lot of algorithms and techniques for generati...

متن کامل

High Utility Itemset Mining

Data Mining can be defined as an activity that extracts some new nontrivial information contained in large databases. Traditional data mining techniques have focused largely on detecting the statistical correlations between the items that are more frequent in the transaction databases. Also termed as frequent itemset mining , these techniques were based on the rationale that itemsets which appe...

متن کامل

An Efficient Algorithm for Mining Closed High Utility Itemset

Mining of High utility itemsets refers to discovering sets of data items that have high utilities. In recent years the high utility itemsets mining has extensive attentions due to the wide applications in various domains like biomedicine and commerce. Extraction of high utility itemsets from database is very problematic task. The formulated high utility itemset degrades the efficiency of the mi...

متن کامل

An Efficient Data Structure for Fast Mining High Utility Itemsets

Abstract: High utility itemset mining has emerged to be an important research issue in data mining since it has a wide range of real life applications. Although a number of algorithms have been proposed in recent years, there seems to be still a lack of efficient algorithms since these algorithms suffer from either the problem of low efficiency of calculating candidates’ utilities or the proble...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Expert Syst. Appl.

دوره 101  شماره 

صفحات  -

تاریخ انتشار 2018